library(GO.db)
library(graph)
library(dnet)
library(magrittr)
library(tidyverse)
library(pander)
library(scales)
library(plotly)
The aim of this page is to keep an up-to-date summary of the Gene Ontology database, as represented in the R package GO.db. The information contained for each GO term is:
The main object created is available for download here and this will load as a tibble after using the function read_rds() (fro the package readr)
The package AnnotationDbi contains the function makeGOGraph which returns a graphNEL graph. In the following, we will:
"all" as this is essentially redundantdnet. Whilst hosted on CRAN, this package Depends on the bioconductor package supraHex and should be installed using BiocManager::install("dnet")Note that the following code can take several minutes to run as these are large graphs.
graphs <- c(BP = "bp", CC = "cc", MF = "mf") %>%
lapply(makeGOGraph) %>%
lapply(function(x){removeNode("all", x)}) %>%
lapply(dDAGreverse)
| Ontology | Nodes | Edges |
|---|---|---|
| BP | 29,699 | 71,745 |
| CC | 4,202 | 7,534 |
| MF | 11,148 | 13,681 |
goSummaries <- lapply(graphs, function(x){
lng <- dDAGlevel(x, "longest_path") - 1
shrt <- dDAGlevel(x, "shortest_path") - 1
tips <- dDAGtip(x)
tibble(
id = unique(c(names(lng), names(shrt))),
shortest_path = shrt,
longest_path = lng,
terminal_node = id %in% tips
)
}) %>%
bind_rows() %>%
mutate(ontology = Ontology(id))
Path lengths for each ontology, based on whether a term is a terminal node or not.
Cumulative number of GO Terms with paths \(\geq\) x.
In reality, we can just add this table to our GO analysis table from tools like goana() and use it to filter results before adjusting p-values.
As an example of thow to use this to assist our decision making, if we chose to remove GO terms with a shortest path \(\leq\) 4, we can see how many terms we would keep and retain.
ggplotly(goSummaries %>%
mutate(keep = shortest_path > 4) %>%
ggplot(aes(keep, fill = terminal_node)) +
geom_bar() +
facet_wrap(~ontology, nrow = 1) +
scale_y_continuous(labels = comma) +
labs(x = "Term Retained", y = "Total") +
theme_bw())
goSummaries %>%
mutate(keep = shortest_path > 4) %>%
group_by(ontology, terminal_node, keep) %>%
tally() %>%
spread(key = keep, value = n) %>%
rename(Discard = `FALSE`,
Retain = `TRUE`) %>%
bind_rows(
tibble(ontology = "**Total**",
Discard = sum(.$Discard),
Retain = sum(.$Retain))) %>%
pander(big.mark = ",",
justify = "llrr")
| ontology | terminal_node | Discard | Retain |
|---|---|---|---|
| BP | FALSE | 8,149 | 9,346 |
| BP | TRUE | 4,200 | 8,004 |
| CC | FALSE | 1,178 | 280 |
| CC | TRUE | 2,093 | 651 |
| MF | FALSE | 987 | 1,084 |
| MF | TRUE | 2,142 | 6,935 |
| Total | NA | 18,749 | 26,300 |
Alternatively, we could remove GO terms with a longest path back to the root node is \(\leq 5\) steps.
ggplotly(goSummaries %>%
mutate(keep = longest_path > 5) %>%
ggplot(aes(keep, fill = terminal_node)) +
geom_bar() +
facet_wrap(~ontology, nrow = 1) +
scale_y_continuous(labels = comma) +
labs(x = "Term Retained", y = "Total") +
theme_bw())
goSummaries %>%
mutate(keep = longest_path > 5) %>%
group_by(ontology, terminal_node, keep) %>%
tally() %>%
spread(key = keep, value = n) %>%
rename(Discard = `FALSE`,
Retain = `TRUE`) %>%
bind_rows(
tibble(ontology = "**Total**",
Discard = sum(.$Discard),
Retain = sum(.$Retain))) %>%
pander(big.mark = ",",
justify = "llrr")
| ontology | terminal_node | Discard | Retain |
|---|---|---|---|
| BP | FALSE | 3,400 | 14,095 |
| BP | TRUE | 889 | 11,315 |
| CC | FALSE | 407 | 1,051 |
| CC | TRUE | 562 | 2,182 |
| MF | FALSE | 1,227 | 844 |
| MF | TRUE | 5,837 | 3,240 |
| Total | NA | 12,322 | 32,727 |
The summaries obtained above can be downloaded from here. Place this in the appropriate folder and the object can then be imported using read_rds("path/to/goSummaries.RDS")
R version 3.5.2 (2018-12-20)
**Platform:** x86_64-pc-linux-gnu (64-bit)
locale: LC_CTYPE=en_AU.UTF-8, LC_NUMERIC=C, LC_TIME=en_AU.UTF-8, LC_COLLATE=en_AU.UTF-8, LC_MONETARY=en_AU.UTF-8, LC_MESSAGES=en_AU.UTF-8, LC_PAPER=en_AU.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_AU.UTF-8 and LC_IDENTIFICATION=C
attached base packages: parallel, stats4, stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: bindrcpp(v.0.2.2), plotly(v.4.8.0), scales(v.1.0.0), pander(v.0.6.3), forcats(v.0.3.0), stringr(v.1.3.1), dplyr(v.0.7.8), purrr(v.0.2.5), readr(v.1.3.1), tidyr(v.0.8.2), tibble(v.2.0.0), ggplot2(v.3.1.0), tidyverse(v.1.2.1), magrittr(v.1.5), dnet(v.1.1.4), supraHex(v.1.20.0), hexbin(v.1.27.2), igraph(v.1.2.2), graph(v.1.60.0), GO.db(v.3.7.0), AnnotationDbi(v.1.44.0), IRanges(v.2.16.0), S4Vectors(v.0.20.1), Biobase(v.2.42.0) and BiocGenerics(v.0.28.0)
loaded via a namespace (and not attached): nlme(v.3.1-137), lubridate(v.1.7.4), bit64(v.0.9-7), httr(v.1.4.0), Rgraphviz(v.2.26.0), tools(v.3.5.2), backports(v.1.1.3), R6(v.2.3.0), DBI(v.1.0.0), lazyeval(v.0.2.1), colorspace(v.1.3-2), withr(v.2.1.2), tidyselect(v.0.2.5), bit(v.1.1-14), compiler(v.3.5.2), cli(v.1.0.1), rvest(v.0.3.2), Cairo(v.1.5-9), xml2(v.1.2.0), labeling(v.0.3), digest(v.0.6.18), rmarkdown(v.1.11), pkgconfig(v.2.0.2), htmltools(v.0.3.6), highr(v.0.7), htmlwidgets(v.1.3), rlang(v.0.3.0.1), readxl(v.1.2.0), rstudioapi(v.0.8), RSQLite(v.2.1.1), shiny(v.1.2.0), bindr(v.0.1.1), generics(v.0.0.2), jsonlite(v.1.6), crosstalk(v.1.0.0), Matrix(v.1.2-15), Rcpp(v.1.0.0), munsell(v.0.5.0), ape(v.5.2), stringi(v.1.2.4), yaml(v.2.2.0), MASS(v.7.3-51.1), plyr(v.1.8.4), grid(v.3.5.2), blob(v.1.1.1), promises(v.1.0.1), crayon(v.1.3.4), lattice(v.0.20-38), haven(v.2.0.0), hms(v.0.4.2), knitr(v.1.21), pillar(v.1.3.1), reshape2(v.1.4.3), glue(v.1.3.0), evaluate(v.0.12), data.table(v.1.11.8), modelr(v.0.1.2), httpuv(v.1.4.5.1), cellranger(v.1.1.0), gtable(v.0.2.0), assertthat(v.0.2.0), xfun(v.0.4), mime(v.0.6), xtable(v.1.8-3), broom(v.0.5.1), later(v.0.7.5), viridisLite(v.0.3.0) and memoise(v.1.1.0)